89 research outputs found
Recommended from our members
Deductible imputation in administrative medical claims datasets
Objective: To validate imputation methods used to infer plan-level deductibles and determine which enrollees are in high-deductible health plans (HDHPs) in administrative claims datasets. Data sources and study setting: 2017 medical and pharmaceutical claims from OptumLabs Data Warehouse for US individuals Study design: We impute plan deductibles using four methods: (1) parametric prediction using individual-level spending; (2) parametric prediction with imputation and plan characteristics; (3) highest plan-specific mode of individual annual deductible spending; and (4) deductible spending at the 80th percentile among individuals meeting their deductible. We compare deductibles' levels and categories for imputed versus actual deductibles. Data collection/extraction methods: Not applicable. Principal findings: All methods had a positive predictive value (PPV) for determining high- versus low-deductible plans of ≥87%; negative predictive values (NPV) were lower. The method imputing plan-specific deductible spending modes was most accurate and least computationally intensive (PPV: 95%; NPV: 91%). This method also best correlated with actual deductible levels; 69% of imputed deductibles were within $250 of the true deductible. Conclusions: In the absence of plan structure data, imputing plan-specific modes of individual annual deductible spending best correlates with true deductibles and best predicts enrollees in HDHPs.</p
A User-Friendly Hybrid Sparse Matrix Class in C++
When implementing functionality which requires sparse matrices, there are
numerous storage formats to choose from, each with advantages and
disadvantages. To achieve good performance, several formats may need to be used
in one program, requiring explicit selection and conversion between the
formats. This can be both tedious and error-prone, especially for non-expert
users. Motivated by this issue, we present a user-friendly sparse matrix class
for the C++ language, with a high-level application programming interface
deliberately similar to the widely used MATLAB language. The class internally
uses two main approaches to achieve efficient execution: (i) a hybrid storage
framework, which automatically and seamlessly switches between three underlying
storage formats (compressed sparse column, coordinate list, Red-Black tree)
depending on which format is best suited for specific operations, and (ii)
template-based meta-programming to automatically detect and optimise execution
of common expression patterns. To facilitate relatively quick conversion of
research code into production environments, the class and its associated
functions provide a suite of essential sparse linear algebra functionality
(eg., arithmetic operations, submatrix manipulation) as well as high-level
functions for sparse eigendecompositions and linear equation solvers. The
latter are achieved by providing easy-to-use abstractions of the low-level
ARPACK and SuperLU libraries. The source code is open and provided under the
permissive Apache 2.0 license, allowing unencumbered use in commercial
products
Bayesian spatial extreme value analysis of maximum temperatures in County Dublin, Ireland
In this study, we begin a comprehensive characterisation of temperature
extremes in Ireland for the period 1981-2010. We produce return levels of
anomalies of daily maximum temperature extremes for an area over Ireland, for
the 30-year period 1981-2010. We employ extreme value theory (EVT) to model the
data using the generalised Pareto distribution (GPD) as part of a three-level
Bayesian hierarchical model. We use predictive processes in order to solve the
computationally difficult problem of modelling data over a very dense spatial
field. To our knowledge, this is the first study to combine predictive
processes and EVT in this manner. The model is fit using Markov chain Monte
Carlo (MCMC) algorithms. Posterior parameter estimates and return level
surfaces are produced, in addition to specific site analysis at synoptic
stations, including Casement Aerodrome and Dublin Airport. Observational data
from the period 2011-2018 is included in this site analysis to determine if
there is evidence of a change in the observed extremes. An increase in the
frequency of extreme anomalies, but not the severity, is observed for this
period. We found that the frequency of observed extreme anomalies from
2011-2018 at the Casement Aerodrome and Phoenix Park synoptic stations exceed
the upper bounds of the credible intervals from the model by 20% and 7%
respectively
Assessing United States county-level exposure for research on tropical cyclones and human health
Includes bibliographical references (pages 067007-12-067007-13).Background: Tropical cyclone epidemiology can be advanced through exposure assessment methods that are comprehensive and consistent across space and time, as these facilitate multiyear, multistorm studies. Further, an understanding of patterns in and between exposure metrics that are based on specific hazards of the storm can help in designing tropical cyclone epidemiological research. Objectives: a) Provide an open-source data set for tropical cyclone exposure assessment for epidemiological research; and b) investigate patterns and agreement between county-level assessments of tropical cyclone exposure based on different storm hazards. Methods: We created an open-source data set with data at the county level on exposure to four tropical cyclone hazards: peak sustained wind, rainfall, flooding, and tornadoes. The data cover all eastern U.S. counties for all land-falling or near-land Atlantic basin storms, covering 1996–2011 for all metrics and up to 1988–2018 for specific metrics. We validated measurements against other data sources and investigated patterns and agreement among binary exposure classifications based on these metrics, as well as compared them to use of distance from the storm’s track, which has been used as a proxy for exposure in some epidemiological studies. Results: Our open-source data set was typically consistent with data from other sources, and we present and discuss areas of disagreement and other caveats. Over the study period and area, tropical cyclones typically brought different hazards to different counties. Therefore, when comparing exposure assessment between different hazard-specific metrics, agreement was usually low, as it also was when comparing exposure assessment based on a distance-based proxy measurement and any of the hazard-specific metrics. Discussion: Our results provide a multihazard data set that can be leveraged for epidemiological research on tropical cyclones, as well as insights that can inform the design and analysis for tropical cyclone epidemiological researc
Improving gene-set enrichment analysis of RNA-Seq data with small replicates
Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set. We demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance. An efficient R package (AbsFilterG- SEA) coded with C++ (Rcpp) is available from CRAN.open
- …